{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "31dc40bf-8ee3-4054-8b72-de591f8f3a70", "metadata": {}, "source": [ "# Deep Reinforcement Learning\n", "\n", "[data:image/s3,"s3://crabby-images/fbe1d/fbe1d2f89215b7589b3f89aa2112c2614f97d3b5" alt="Binder"](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=06-Deep%20Reinforcement%20Learning.ipynb)" ] }, { "cell_type": "markdown", "id": "aa28110f-fb76-436f-ad22-20b6e76cb369", "metadata": {}, "source": [ "In this chapter we use the previously demonstrated deep learning capabilities of **RLtools** in combination with a (inverted) pendulum simulator that is equivalent to the `Pendulum-v1` in [gym/gymnasium](https://github.com/Farama-Foundation/Gymnasium) to train a swing-up control policy. For the training, we use the [TD3](https://proceedings.mlr.press/v80/fujimoto18a) off-policy, deep-RL algorithm. TD3 and required supporting data structures and algorithms are integrated in **RLtools**. \n", "\n", "